This report explores in depth numerous different White Wines.
Load the Packages
Univariate Plots Section
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00
## Median : 5.200 Median :0.04300 Median : 34.00
## Mean : 6.391 Mean :0.04577 Mean : 35.31
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00
## Max. :65.800 Max. :0.34600 Max. :289.00
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.720 Min. :0.2200
## 1st Qu.:108.0 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100
## Median :134.0 Median :0.9937 Median :3.180 Median :0.4700
## Mean :138.4 Mean :0.9940 Mean :3.188 Mean :0.4898
## 3rd Qu.:167.0 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500
## Max. :440.0 Max. :1.0390 Max. :3.820 Max. :1.0800
## alcohol quality
## Min. : 8.00 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.40 Median :6.000
## Mean :10.51 Mean :5.878
## 3rd Qu.:11.40 3rd Qu.:6.000
## Max. :14.20 Max. :9.000
This dataset includes 11 different input variables with over 4898 observations
of White Wines and 1 output variable (Quality).
Remove the unneeded variable X.


2 quick histogram charts to show the frequency of alcohol and citric acid in
These plots also show a mostly normal distribution.


The residual.sugar plot is skewed to the left (less sweet White Wines?)
and the alcohol plot is pretty spread out.

There is an interesting spike in citric acid around .5.
Univariate Analysis
What is the structure of your dataset?
The Dataset is made up of 4898 observation of White Wines with 11 inputs
(fixed acidity, volatile acidity, citric acid, residual sugar, chlorides,
free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol)
and 1 output (quality).
What is the main features of interest in your dataset?
I think the main features of this White Wine dataset are Alcohol(%) as
well as Residual Sugar. They are the 2 main variables that appear to not
have a normal distribution.
What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?
Chlorides, Volatile Acidity, and total sulfur dioxide seem to play a smaller
part in the quality of the White Wine. Citric acid also has an interesting
spike around .5.
Did you create any new variables from existing variables in the dataset?
I created a new variable called quality_fac to aid in the factoring of the
quality of some of my plots and to show better visualizations of the data
in the Multivariate Section below.
Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the
Biavariate Plots

Using GGpairs to see any apparent correlations with the data. There appears
to be multiple correlations between a number of variables in the dataset to
The best quality White Wines seem to have a pH of 3.0 to 3.5, alcohol content
of between 10 and 13, medium to high levels of citric acid (.25-.5), and low
They also seen to have a lower density, lower chlorides (.2 - .6), somewhat
There is definitely a linear relationship between density and total sulfur
dioxide as seen in the plot above as well as a negative linear relationship
between alcohol and density. After comparing density and residual sugar,
they appears to be a very strong linear relationship also.
Bivariate Analysis
Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?
The GGpairs plot was pretty interesting because it put everything together
and showed correlations between the variables. One of the biggest was the
correlation between density and residual sugars as well as density and
alcohol.
All the higher quality White Wines have a medium level of pH between 3-3.5,
higher level of alcohol content, mid-high citric acid level, lower residual
sugars, lower chlorides, lower density, and somewhat lower fixed acidity.
Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?
Density seems to be closely related to residual sugar and fixed acidity.
Alcohol also seems to be related to density and total sulfur dioxide, and it
appears the higher alcohol content it has the less dense the White Wine is.
What was the strongest relationship you found?
The strongest relationship appears to be between Density and Residual Sugar.
Mulivariate Plot Section
Adding factor to quality for ranking purposes.

The higher quality White Wines seem to have more alcohol and lower density
than the lower quality White Wines.


Both plots show no real correlation between sulphates and total or free
sulfur dioxide on the quality of White Wine. I was curious because sulphates
tend to contribute to sulfur dioxide levels according to the description of
attributes.

Volatile Acidity seems to play a role in the quality of White Wine as well
as Fixed Acidity to a lower extent.

Quality White Wines have an above average level of citric acid, lower level of
chlorides, higher level of alcohol, and medium to high level of fixed acidity
compared to the lower quality White Wines.

Once again the Quality of White Wines but this time with the Density and
Alcohol switched around showing the strong linear relationship as well as the
quality factor.


This is a better look at the different quality White Wines compared to
Density and Alcohol together and then seperate in order to see the
All three of these boxplots seems to back up my findings that higher citric
acid, lower density (higher alcohol content), and lower chlorides make a
better quality White Wine.

Density seems to be closely tied to the alcohol content as well as possibly
total sulfur dioxide. The higher the alcohol content the less dense the
White Wines appear to be.
Multivariate Analysis
Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?
I was suprised to find that the higher quality White Wines seem to have a
higher alcohol content which in turn means a lower density. I thought that
the opposite would be true due to the taste of alcohol.
I was also suprised to find that the higher quality White Wines had a medium
to high level of citric acid as well as low levels of chlorides(salt).
Were there any interesting or surprising interactions between features?
I thought it was definitely interesting that as the alcohol content goes
up the density goes down.
OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.
I did not create a model.
Final Plots and Summary
Plot One

The Density and Residual Sugar of the White Wines have a strong linear
relationship as shown in the above plot
Plot Two

The higher quality White Wines have a higher alcohol content and lower
density than the lower quality White Wines. It also shows a strong linear
relationship between density and alcohol.
Plot Three



All three of these boxplots seems to back up my findings that higher citric
acid, lower density (higher alcohol content), and lower chlorides make a
better quality White Wine.
Reflection
This dataset contained 4898 observations of White Wine Quality with 11
inputs and 1 output. After exploring the data in detail I can say for certain
I know alot more about Wine than I have ever known. At first I was
concentrating strictly on what variables are needed to make a high quality
wine then after more research I started wondering how the variables related
with one another. I was very suprised to find that density is closely related
to alcohol content as well as residual sugars. The more dense the wine was
the less alcohol content it contained.
After examing the sulphates and sulfur dioxide I was very suprised to learn
they are not closely correlated as it mentioned in the description of
attributes that sulphates can contribute to sulfur dioxide gas levels. It
appears that density and total sulfur dioxide have a linear realtionship also
that could be futhur examined.
I think there is opportunity for furthur understanding of the makeup of
a good quality wine with more data on a wider number of white wines. Breaking
up the data into the 7 classes of whites would also allow you to gain more
understanding how the different variables that make up the wines react and
come together to form a high quality wine.